Web Search
ثبت نشده
چکیده
With a constantly increasing size of billions of freely accessible documents, one of the major issues raised by the World Wide Web is that of searching in an effective and efficient way through these documents to find these that best suit a user's need. The purpose of the chapter is to describe the techniques that are at the core of today's search engines (such as Google 1 , Yahoo! 2 , Microsoft Live Search 3 or Exalead 4), that is, mostly keyword search in collections of text documents. We also briefly touch upon other techniques and research issues that may be of importance in next-generation search engines. This chapter is organized as follows: in Section 1, we discuss the Web and the languages and protocols it relies upon. We then present in Section 2 the techniques that can be used to retrieve pages from the Web, that is, to crawl it. First-generation search engines, exemplified by Altavista 5 relied mostly on the classical information retrieval (IR) techniques that are described in Section 3. With the advent of Google, other techniques that make use of the graph structure of the Web (see Section 4) have very effectively complemented text IR. We then proceed to a brief discussion of currently active research topics about the Web in Section 5. Whereas the Internet is a physical network of computers (or hosts) connected to each other from all around the world, the World Wide Web, WWW or Web in short, is a logical collection of hyperlinked documents shared by the hosts of this network. An hyperlinked document is just a document with references to other documents of the same collection. Note that documents of the Web may both refer to static documents stored on the hard drive of some host of the Internet, and to dynamic documents that are generated on the fly when accessing the document. This means that there is a virtually unlimited number of documents on the Web, since dynamic documents can change on each request. When one speaks of the Web, it is mostly about the public part of the Web, which is freely accessible, but there are also various private Webs that are restricted to some community or company, either on private Intranets or on the Internet, with password-protected pages. Documents, and, more generally, resources on the Web, are identified by a URL (Uniform Resource Locator) which is a …
منابع مشابه
Towards Supporting Exploratory Search over the Arabic Web Content: The Case of ArabXplore
Due to the huge amount of data published on the Web, the Web search process has become more difficult, and it is sometimes hard to get the expected results, especially when the users are less certain about their information needs. Several efforts have been proposed to support exploratory search on the web by using query expansion, faceted search, or supplementary information extracted from exte...
متن کاملAnalysis of users’ query reformulation behavior in Web with regard to Wholis-tic/analytic cognitive styles, Web experience, and search task type
Background and Aim: The basic aim of the present study is to investigate users’ query reformulation behavior with regard to wholistic-analytic cognitive styles, search task type, and experience variables in using the Web. Method: This study is an applied research using survey method. A total of 321 search queries were submitted by 44 users. Data collection tools were Riding’s Cognitive Style A...
متن کاملA New Hybrid Method for Web Pages Ranking in Search Engines
There are many algorithms for optimizing the search engine results, ranking takes place according to one or more parameters such as; Backward Links, Forward Links, Content, click through rate and etc. The quality and performance of these algorithms depend on the listed parameters. The ranking is one of the most important components of the search engine that represents the degree of the vitality...
متن کاملA Technique for Improving Web Mining using Enhanced Genetic Algorithm
World Wide Web is growing at a very fast pace and makes a lot of information available to the public. Search engines used conventional methods to retrieve information on the Web; however, the search results of these engines are still able to be refined and their accuracy is not high enough. One of the methods for web mining is evolutionary algorithms which search according to the user interests...
متن کاملQuery Architecture Expansion in Web Using Fuzzy Multi Domain Ontology
Due to the increasing web, there are many challenges to establish a general framework for data mining and retrieving structured data from the Web. Creating an ontology is a step towards solving this problem. The ontology raises the main entity and the concept of any data in data mining. In this paper, we tried to propose a method for applying the "meaning" of the search system, But the problem ...
متن کاملIdentifying Ferdowsi University of Mashhad Graduated Students' Search Strategies during their Information-searching through the Web
Purpose: the aim was to identify users' search strategies and the rate of using search strategies on the web. Method: It is a practical survey. The statistical population included all the postgraduate students in the first semester at Ferdowsi University of Mashhad. 95 students were selected by stratified random sampling method. To gather the data, log files were used. Findings: 12 search strat...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2008